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Abstract 

The  value  of  a  HPC  system  to  a  user  includes  many  factors,  such  as:  execution  time 
on  a  particular  problem,  software  development  time,  and  both  direct  and  indirect 
costs.  The  DARPA  High  Productivity  Computing  Systems  is  focused  on  providing  a 
new  generation  of  economically  viable  high  productivity  computing  systems  for  the 
national  security  and  industrial  user  community  in  the  2007-2010  timeframe.  The 
goal  is  to  provide  systems  that  double  in  productivity  (or  value)  every  18  months. 
This  program  has  initiated  a  fundamental  reassessment  of  how  we  define  and 
measure  performance,  programmability,  portability,  robustness  and  ultimately 
productivity  in  the  HPC  domain.  This  talk  will  describe  the  HPCS  efforts  to 
develop  a  productivity  assessment  framework  (see  Figure  1),  characterize  HPC  user 
workflows,  and  define  the  scope  of  the  target  applications. 

Introduction 

The  HPCS  program  seeks  to  create  trans-Pefaflop  systems  of  significant  value  to  the 
Government  HPC  community.  Such  value  will  be  determined  by  assessing  many 
additional  factors  beyond  just  theoretical  peak  flops  (i.e.  “Machoflops”).  Ultimately,  the 
goal  is  to  decrease  the  time-to-solution,  which  means  decreasing  both  the  execution  time 
and  development  time  of  an  application  on  a  particular  system.  Evaluating  the 
capabilities  of  a  system  with  respect  to  these  goals  requires  a  different  assessment 
process.  The  goal  of  the  HPCS  assessment  activity  is  to  prototype  and  baseline  a  process 
that  can  be  transitioned  to  the  acquisition  community  for  2010  procurements. 

Development  Time 

The  most  novel  part  of  the  assessment  activity  will  be  the  effort  to  measure/predict  the 
ease  or  difficulty  of  developing  HPC  applications.  Currently,  there  is  no  quantitative 
methodology  for  comparing  the  development  time  impact  of  various  HPC  programming 
technologies.  To  achieve  this  goal,  we  will  use  a  variety  of  tools  including 
Application  of  code  metrics  on  existing  HPC  codes 
Several  prototype  analytic  models  of  development  time 
Interface  characterization  (e.g.  language,  parallel  model,  memory  model,  . . .) 
Scalable  benchmarks  designed  for  testing  both  performance  and  programmability 
Classroom  software  engineering  experiments 
Human  validated  demonstrations 

These  tools  will  provide  the  baseline  data  necessary  for  modeling  development  time  and 
allow  the  new  technologies  developed  under  HPCS  to  be  assessed  quantitatively. 
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Execution  Time 

The  execution  time  part  of  the  assessment  activity  will  leverage  the  strong  heritage  in  the 
HPC  performance  modeling  community.  This  will  include  analytic,  source  code,  and 
executable  based  tools  for  analyzing  the  projected  performance  of  various  applications  on 
current,  next  generation  and  HPCS  designs.  The  execution  time  and  development  time 
activities  will  be  strongly  coupled  so  as  to  provide  a  clear  picture  to  the  community  of  the 
tradeoffs  that  exist  between  execution  time  and  development  time. 
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Figure  1:  HPCS  Assessment  Framework.  The  goal  of  the  framework  is  to  provide  a 
mechanism  for  integrating  system  specific  capabilities  with  user  specific  needs  to  assess 
the  value  of  a  particular  machine  for  a  particular  mission. 
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High  Productivity  Computing  Systems 

-Program  Overview- 


>  Create  a  new  generation  of  economically  viable  computing  systems  and  a 
procurement  methodology  for  the  security/industrial  community  (2007  -  2010) 
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Motivation:  Metrics  Drive  Designs 

“You  get  what  you  measure” 
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HPCS  Productivity  Factors:  Performance,  Programmability, 
Portability,  and  Robustness  are  very  closely  coupled  with  each  work  flow 


Lone  Researcher 


•  Missions  (development):  Cryptanalysis,  Signal  Processing,  Weather, 
Electromagnetics 

•  Process  Overview 

-  Goal:  solve  a  compute  intensive  domain  problem:  crack  a  code,  incorporate  new 
physics,  refine  a  simulation,  detect  a  target 

-  Starting  point:  inherited  software  framework  (~3,000  lines) 

-  Modify  framework  to  incorporate  new  data  (~10%  of  code  base) 

-  Make  algorithmic  changes  (~10%  of  code  base);  Test  on  data;  Iterate 

-  Progressively  increase  problem  size  until  success 

-  Deliver:  code,  test  data,  algorithm  specification 

•  Environment  overview 

-  Duration:  months  Team  size:  1 

-  Machines:  workstations  (some  clusters),  HPC  decreasing 

-  Languages:  FORTRAN,  C  -»  Matlab,  Python 

-  Libraries:  math  (external)  and  domain  (internal) 

•  Software  productivity  challenges 

-  Focus  on  rapid  iteration  cycle 

-  Frameworks/libraries  often  serial 

Experiment 
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Domain  Researcher  (special  case) 


Scientific  Research:  DoD  HPCMP  Challenge  Problems,  NNSA/ASCI  Milestone 
Simulations 

Process  Overview 

-  Goal:  Use  HPC  to  perform  Domain  Research 

-  Starting  point:  Running  code,  possibly  from  an  Independent  Software  Vendor  (ISV) 

-  NO  modifications  to  codes 

-  Repeatedly  run  the  application  with  user  defined  optimization 


Environment  overview 

-  Duration:  months  Team  size:  1-5 

-  Machines:  workstations  (some  clusters),  HPC 

-  Languages:  FORTRAN,  C 

-  Libraries:  math  (external)  and  domain  (internal) 

Software  productivity  challenges  —  None! 
Productivity  challenges 

-  Robustness  (reliability) 

-  Performance 

-  Resource  center  operability 
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Enterprise  Design 


*  Missions  (development):  Weapons  Simulation,  Image  Processing 

•  Process  Overview 

-  Goal:  develop  or  enhance  a  system  for  solving  a  compute  intensive  domain 
problem:  incorporate  new  physics,  process  a  new  surveillance  sensor 

-  Starting  point:  software  framework  (~1 00,000  lines)  or  module  (-1 0,000  lines) 

-  Define  sub-scale  problem  for  initial  testing  and  development 
Make  algorithmic  changes  (~10%  of  code  base);  Test  on  data;  Iterate 

-  Progressively  increase  problem  size  until  success 
Deliver:  code,  test  data,  algorithm  specification,  iterate  with  user 

•  Environment  overview 

-  Duration:  ~1  year  Team  size:  2-20 

-  Machines:  workstations,  clusters,  hpc 
Languages:  FORTRAN,  C,  -»  C++,  Matlab,  Python, 

-  Libraries:  open  math  and  communication  libraries 

*  Software  productivity  challenges 

-  Legacy  portability  essential 

Avoid  machine  specific  optimizations  (SIMD,  DMA,  ...) 

-  Later  must  convert  high  level  language  code 

Simulation 

MITRE  MIT  Lincoln  Laboratory  ISI 

Slide-11 

HPCS  Productivity 


Production 


•  Missions  (production):  Cryptanalysis,  Sensor  Processing,  Weather 

•  Process  Overview 

-  Goal:  develop  a  system  for  fielded  deployment  on  an  HPC  system 

-  Starting  point:  algorithm  specification,  test  code,  test  data,  development  software 
framework 

Rewrite  test  code  into  development  framework;  Test  on  data;  Iterate 

-  Port  to  HPC;  Scale;  Optimize  (incorporate  machine  specific  features) 

-  Progressively  increase  problem  size  until  success 

-  Deliver:  system 

•  Environment  overview 

-  Duration:  ~1  year  Team  size:  2-20 

-  Machines:  workstations  and  HPC  target 

-  Languages:  FORTRAN,  C,  ->•  C++ 

•  Software  productivity  challenges 

-  Conversion  of  higher  level  languages 

-  Parallelization  of  serial  library  functions 

-  Parallelization  of  algorithm 

-  Sizing  of  HPC  target  machine 
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HPC  Workflow  SW  Technologies 
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Example  Existing  Code  Analysis 
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Controlled  experiments  can  potentially  measure  the  impact  of  different 
technologies  and  quantify  development  time  and  execution  time  tradeoffs 


Novel  Metrics 


*  HPC  Software  Development  often  involves  changing  code  (Ax) 
to  change  performance  (Ay) 

-  1st  order  size  metrics  measures  scale  of  change  E(Ax) 

-  2nd  order  metrics  would  measure  nature  of  change  E(Ax2) 

•  Example:  2  Point  Correlation  Function 

-  Looks  at  “distance”  between  code  changes 

-  Determines  if  changes  are  localized  (good)  or  distributed  (bad) 
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*  Other  Zany  Metrics 
-  See  Cray  talk 
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Code  distance 


HAPPA 
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Outline 


•  Introduction 

•  Workflows 

•  Metrics 

•  Models  &  Benchmarks  #  Prot°tyPe  Models 

•  A&P  Benchmarks 

•  Schedule  and  Summary 
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Prototype  Productivity  Models 


Special  Model  with  Work  Estimator  (Sterling) 

xu  -  SpxEx  A 

JLW  - “ - 

Cf  x  <jT  X  (cm  +  Co)  x  T 


Utility  (Snir) 


P(S,  A,U(.))  =  min 


COS^ 


U(T(S,A,Cost)) 
Cost 


Productivity  Factor  Based  (Kepner) 

(useful  ops  \ 
second  J 


productivity  GUPS 

Linpack 


PUPS 

Linpack 


Hardware  Cost 

v  J 

( productivity  ]  „  ( Language^!  ( Parallel^  PortabiHtv  x  Availability 
l  factor  M  Level  J  l  Model  J  x  Portablllty  x  Maintenance 


^productivity  Y mission^ 
factor  X  factor  J 


Efficiency  and  Power 
(Kennedy,  Koelbel,  Schreiber) 


T(Pl)  =  I(Pl)  +  rE(PL) 


E{Pl ) 
E(Po) 


CoCoMo  II 

(software  engineering 
community) 


Least  Action  (Numrich) 


L^pTers]  x  A  x  [si“]  ' 


Scale ' 
Factors 


Time-To-Solution  (Kogge) 


8  S  =  0 


hour  day  week  month  year 

Execution  Time 


HPCS  has  triggered  ground  breaking  activity  in  understanding  HPC  productivity 

-Community  focused  on  quantifiable  productivity  (potential  for  broad  impact) 

-Numerous  proposals  provide  a  strong  foundation  for  Phase  2 
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Code  Size  and  Reuse  Cost 


Lines  of  code 
Function  Points 
Reuse 

Re-engineering 

Maintenance 


Lines  per  function  point 

C,  Fortran 

-100 

Fortran77 

-100 

C++ 

-30 

Java 

-30 

Matlab 

-10 

Python 

-10 

Spreadsheet 

-5 

New 


Reused 


Re-engineered 


Maintained 


Code, 

Size 

Measured  in  lines  of  code  or  functions  points  (converted  to  lines  of  code) 


HPC  Challenge  Areas 

Function  Points 

High  productivity  languages  not  available  on  HPC 
Reuse 

Nonlinear  reuse  effects.  Performance  requirements  dictate 
“white  box”  reuse  model 


vJVIlVVCIIC  IVCU3C  uuai 


Box 


*  Code  size  is  the  most  important  software 
productivity  parameter 

*  Non-HPC  world  reduces  code  size  by 

-  Higher  level  languages 

-  Reuse 

*  HPC  performance  requirements  currently 
limit  the  exploitation  of  these  approaches 


Activity  &  Purpose  Benchmarks 


Legend 

□  Purpose 
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Activity 
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Activity  &  Purpose  Benchmark 
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Standard  Interface 


Run 


Standard  Interface 


Data  Generation  and  Validation 
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D) 
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Accuracy 
£>ata  Points 


Algorithm 

Development 


Spec 


Design,  Code,  Test 


Port,  Scale,  Optimize 


Run 


Development  Workflow 


hpL 


Activity  Benchmarks  define  a  set  of  instructions  (i.e.,  source  code)  to  be  executed 

Purpose  Benchmarks  define  requirements,  inputs  and  output 

Together  they  address  the  entire  development  workflow _ 


HPCS  Phase  1  Example 
Kernels  and  Applications 


Mission  Area 

Kernels 

Application 

Source 

Stockpile  Stewardship 

Random  Memory  Access 

UMT2000 

ASCI  Purple  Benchmarks 

Unstructured  Grids 

Eulerian  Hydrocode 

SAGE3D 

ASCI  Purple  Benchmarks 

Adaptive  Mesh 

Unstructured  Finite 

ALEGRA 

Bio-Application 

Kernels 

Application 

Source 

Adaptive  Mesh  Refinement 

Quantum  and 
Molecular 

Mechanics 

Macromolecular  Dynamics 

CHARM M 

http://yuri.harvard.edu/ 

Operational  Weather 
and  Ocean 

Forecasting 

Finite  Difference  Model 

NLOM 

DoD  HPCMP  TI-03 

Energy  Minimization 

MonteCarlo  Simulation 

Army  Future  Combat 
Weapons  Systems 

Finite  Difference  Model 

CTH 

DoD  HPCMP  TI-03 

Whole  Genome 
Analysis 

Sequence  Comparison 

Needleman- 

Wunsch 

http://www.med.nyu.edu/ 
rcr/rcr/cou  rse/si  m -sw .  htm  1 

Adaptive  Mesh  Refinement 

Crashworthiness 

Simulations 

Multiphysics  Nonlinear 

Finite  Element 

LS-DYNA 

Available  to  Vendors 

BLAST 

http://www.ncbi.nlm.nih.gov/BLAST/ 

FASTA 

http://www.ebi.ac.uk/fasta33/ 

Other  Kernels 

Lower  /  Upper  Triangular 
Matrix  Decomposition 

UNPACK 

Available  on  Web 

HMMR 

http://hmmer.wustl.edu/ 

Conjugate  Gradient  Solver 

DoD  HPCMP  TI-03 

Systems  Biology 

Functional  Genomics 

BioSpice 
(Arkin,  2001) 

http://genomics.lbl.gov/~aparkin/ 

Group/Codebase.html 

QR  Decomposition 

Paper  &  Pencil  for  Kernels 

Biological  Pathway  Analysis 

ID  FFT 

Paper  &  Pencil  for  Kernels 

2D  FFT 

Paper  &  Pencil  for  Kernels 

Table  Toy  (GUP/s) 

Paper  &  Pencil  for  Kernels 

Multiple  Precision 
Mathematics 

Paper  &  Pencil  for  Kernels 

Dynamic  Programming 

Paper  &  Pencil  for  Kernels 

set  ot  scop 
representing 

e  oencnmarKs 

Mission  Partner 
Bio-Science  high- 

■-  _g_ 

Matrix  Transpose 
[Binary  manipulation] 

Paper  &  Pencil  for  Kernels 

Integer  Sort 

[With  large  multiword  key] 

Paper  &  Pencil  for  Kernels 

Binary  Equation  Solution 

Paper  &  Pencil  for  Kernels 

and  emerging 

■-  g  m 

Graph  Extraction 
(Breadth  First)  Search 

Paper  &  Pencil  for  Kernels 

Sort  a  large  set 

Paper  &  Pencil  for  Kernels 

enu  cuiiipuuiig  leijuiiemeiiid 

Construct  a  relationship 
graph  based  on  proximity 

Paper  &  Pencil  for  Kernels 

Various  Convolutions 

Paper  &  Pencil  for  Kernels 

Various  Coordinate 

Transforms 

Paper  &  Pencil  for  Kernels 

Various  Block  Data  Transfers 

Paper  &  Pencil  for  Kernels 

MITRE  MIT  Lincoln  Laboratory  ISI 

Slide-27 

HPCS  Productivity 


Outline 


*  Introduction 

*  Workflows 

*  Metrics 

*  Models  &  Benchmarks 

*  Schedule  and  Summary 
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Execution  Framework  Development 


Phase  II  Productivity  Forum 
Tasks  and  Schedule 


Task  (Communities) 


-Workflow  Models 
(Lincoln/HPCMO/LAN4) 
-Dev  Time  Experiments 
(UMD  /) 


-Dev  &  Exe  Interfaces 
(HPC  SW/FFRDC) 
-A&P  Benchmarks 
(Missions/FFRDC) 
-Unified  Model  Interface 
'HPC  Modelers 


FY03 


FY04 


Q3-Q4 


Q1-Q2 


Competing  Deirelopmen 


Q3-Q4 


Analyze  Existing, 
Design  Exp, 


JL 


'ilot  Studies 


FY05 


Q1-Q2 


Time  Mo 


dels 


Q3-Q4 


Controlled 

Baseline 

Experiments 


FY06 


Q1-Q2 


Q3-Q4 


Mission  Specific 

&  New  Technology 
Demonstrations 


Prototype 
Interfaces  (vO.1) 

(version0.5) 

(version  1.0) 

•  1  1  -  1  1 

Reqs  &  Spec  (~6) 
&  Exe  Spec  (~2) 

Revise  & 

Exe  Spec  (~2) 

Revise  & 

Exe  Spec  (~2)  | 

-HPC  Productivity 
Competitiveness  Coundi 


Data 


Validated 
Dev  Time 
Assessment 
Methodology 


Workflows 


Intelligence 


Weapons  Design 


Surveillance 


Environment 


Bioinformatics 


Productivity 

Productivity 

Roll  Out 

Workshops 

Evaluations 

•roductivity  Metrics 
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Workflows 

i 

Validated 
Exe  Time 
Assessment 
Methodology 


Broad 

Commercial 

Acceptance 
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Summary 


*  Goal  is  to  develop  an  acquisition  quality  framework  for  HPC 
systems  that  includes 

-  Development  time 

-  Execution  time 


*  Have  assembled  a  team  that  will  develop  models,  analyze 
existing  HPC  codes,  develop  tools  and  conduct  HPC 
development  time  and  execution  time  experiments 

*  Measures  of  success 

-  Acceptance  by  users,  vendors  and  acquisition  community 

-  Quantitatively  explain  HPC  rules  of  thumb: 

"OpenMP  is  easier  than  MPI,  but  doesn’t  scale  a  high” 
"UPC/CAF  is  easier  than  OpenMP” 

"Matlab  is  easier  the  Fortran,  but  isn’t  as  fast” 

-  Predict  impact  of  new  technologies 
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HPCS  Phase  II  Teams 


Industry: 


PI:  Elnozahy 
Goal: 


PI:  Gustafson 


PI:  Smith 


>  Provide  a  new  generation  of  economically  viable  high  productivity  computing 
systems  for  the  national  security  and  industrial  user  community  (2007  -  2010) 


Productivity  Team  (Lincoln  Lead) 


MIT  Lincoln 
Laboratory 


PI:  Kepner 
MITRE 


PI:  Lucas 


V^ios  Alamos 

_ m _ HftTIONAL  L  A  B  Ll  Ft  fl  T  Ll  Ft  V 
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y  4 


Q 


CSD 


PI:  Basili 


fffrrrr  m 


UCSB 


PI:  Benson  &  Snavely 

lcs  ^;rc  codeSourcery 


PI:  Koester  Pis:  Vetter,  Lusk,  Post,  Bailey  Pis:  Gilbert,  Edelman,  Ahalt,  Mitchell 


Goal: 

>  Develop  a  procurement  quality  assessment  methodology  that  will  be  the  basis 
of  2010+  HPC  procurements 


Presentation 


Productivity  Framework  Overview 


Back  to  Agenda  ^ 


Next  Abstract  ^ 


Phase  I:  Define 
Framework  &  Scope 
Petascale  Requirements 


Value  Metrics 

■Execution 

■Development 


Workflows 

-Production 

-Enterprise 

-Researcher 


Benchmarks 

-Activity 

•Purpose 


Phase  II:  Implement 
Framework  &  Perform 
Design  Assessments 

in  Evaluation 
Experiments- 


Phase  III:  Transition 
To  HPC  Procurement 
Quality  Framework 

xeptancl 


Leve 


Preliminary 
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HPCS  Vendors 


Final 


/  Multilevel  | 

System 

[  Multilevel 

HPCS  FFRDC  &  Gov 

System 

1  Models 

R&D  Partners 

Models 

\  &  / 

\  &  / 

\  Prototypes  / 

Mission  Agencies 

\  SN001  / 

Commercial  or  Nonprofit 
Productivity  Sponsor 


HPCS  needs  to  develop  a  procurement  quality  assessment 
methodology  that  will  be  the  basis  of  2010+  HPC  procurements 


mitre  MIT  Lincoln  Laboratory  ISI 

Slide-33 

HPCS  Productivity 


